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CHIMERIC GENE CONSTRUCTS FOR GENERATION OF FLUORESCENT 
TRANSGENIC ORNAMENTAL FISH 

FIELD OF THE INVENTION 

This invention relates to fish gene promoters and chimeric gene constructs with these 
5 promoters for generation of transgenic fish, particularly fluorescent transgenic ornamental 



enabling the host to acquire a new and inheritable trait. The technique was first developed 
10 in mice by Gordon et aL (1980). They injected foreign DNA into fertilized eggs and found 

that some of the mice developed from the injected eggs retained the foreign DNA. 

Applying the same technique, Palmiter et al. (1982) have introduced a chimeric gene 

containing a rat growth hormone gene under a mouse heavy metal-inducible gene promoter 

and generated the first batch of genetically engineered supermice, which are almost twice 
15 as large as non-transgenic siblings. This work has opened a promising avenue in using the 

transgenic approach to provide to animals new and beneficial traits for livestock husbandry 

and aquaculture. 

In addition to the stimulation of somatic growth for increasing the gross production 
of animal husbandry and aquaculture, transgenic technology also has many other potential 

20 applications. First of all, transgenic animals can be used as bioreactors to produce 
commercially useful compounds by expression of a useful foreign gene in milk or in blood. 
Many pharmaceutically useful protein factors have been expressed in this way. For 
example, human cd-antitrypsin, which is commonly used to treat emphysema, has been 
expressed at a concentration as high as 35 mg/ml (10% of milk proteins) in the milk of 

25 transgenic sheep (Wright et al., 1991). Similarly, the transgenic technique can also be used 
to improve the nutritional value of milk by selectively increasing the levels of certain 
valuable proteins such as caseins and by supplementing certain new and useful proteins 
such as lysozyme for antimicrobial activity (Maga and Murray, 1995). Second, transgenic 
mice have been widely used in medical research, particularly in the generation of 

30 transgenic animal models for human disease studies (Lathe and Mullins, 1993). More 
recently, it has been proposed to use transgenic pigs as organ donors for 
xenotransplantation by expressing human regulators of complement activation to prevent 
hyperacute rejection during organ transplantation (Cozzi and White, 1995). The 



fish. 



BACKGROUND OF THE INVENTION 



Transgenic technology involves the transfer of a foreign gene into a host organism 
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development of disease resistant animals has also been tested in transgenic mice (e.g. Chen 
etaL, 1988). 

_Fish are also an intensive research subject of transgenic studies. There are many 
ways of introducing a foreign gene into fish, including: microinjection (e.g. Zhu et aL, 
5 1985; Du et al., 1992), electroporation (Powers et aL, 1992), sperm-mediated gene transfer 
(Khoo et al., 1992; Sin et al., 1993), gene bombardment or gene gun (Zelemin et al., 1991), 
liposome-mediated gene transfer (Szelei et al., 1994), and the direct injection of DNA into 
muscle tissue (Xu et aL, 1999). The first transgenic fish report was published by Zhu et al. 
(1985) using a chimeric gene construct consisting of a mouse metallothionein gene 

10 promoter and a human growth hormone gene. Most of the early transgenic fish studies 
have concentrated on growth hormone gene transfer with an aim of generating fast growing 
"superfish". A majority of early attempts used heterologous growth hormone genes and 
promoters and failed to produce gigantic superfish (e.g. Chourrout et al., 1986; Penman et 
al., 1990; Brem et al., 1988; Gross et al., 1992). But enhanced growth of transgenic fish has 

15 been demonstrated in several fish species including Atlantic salmon, several species of 
Pacific salmons, and loach (e.g. Du et al., 1992; Delvin et aL, 1994, 1995; Tsai et aL, 
1995). 

The zebrafish, Danio rerio, is a new model organism for vertebrate developmental 
biology. As an experimental model, the zebrafish offers several major advantages such as 

20 easy availability of eggs and embryos, tissue clarity throughout embryogenesis, external 
development, short generation time and easy maintenance of both the adult and the young. 
Transgenic zebrafish have been used as an experimental tool in zebrafish developmental 
biology. However, despite the fact that the first transgenic zebrafish was reported a decade 
ago (Stuart et aL, 1988), most transgenic zebrafish work conducted so far used 

25 heterologous gene promoters or viral gene promoters: e.g. viral promoters from SV40 
(simian virus 40) and RSV (Rous sarcoma virus) (Stuart et al., 1988, 1990; Bayer and 
Campos-Ortega, 1 992), a carp actin promoter (Liu et aL, 1 990), and mouse homeobox gene 
promoters (Westerfield et al., 1992). As a result, the expression pattern of a transgene in 
many cases is variable and unpredictable. 

30 GFP (green fluorescent protein) was isolated from a jelly fish, Aqueous victoria. 

The wild type GFP emits green fluorescence at a wavelength of 508 nm upon stimulation 
with ultraviolet light (395 nm). The primary structure of GFP has been elucidated by 
cloning of its cDNA and genomic DNA (Prasher et aL, 1992), A modified GFP, also called 
EGFP (Enhanced Green Fluorescent Protein) has been generated artificially and it contains 

35 mutations that allow the protein to emit a stronger green light and its coding sequence has 
also been optimized for higher expression in mammalian cells based on preferable human 
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codons. As a result, EGFP fluorescence is about 40 times stronger than the wild type GFP 
in mammalian cells (Yang et al., 1996). GFP (including EGFP) has become a popular tool 
in cell biology and transgenic research. By fusing GFP with a tested protein, the GFP 
fusion-protein can be used as an indicator of the subcellular location of the tested protein 
5 (Wang and Hazelrigg, 1994) . By transformation of cells with a functional GFP gene, the 
GFP can be used as a marker to identify expressing cells (Chalfie et al., 1994). Thus, the 
GFP gene has become an increasingly popular reporter gene for transgenic research as GFP 
can be easily detected by a non-invasive approach. 

The GFP gene (including EGFP gene) has also been introduced into zebrafish in 
10 several previous reports by using various gene promoters, including Xenopus elongation 

factor J a enhancer-promoter (Amsterdam et al., 1995, 1996), rat myosin light-chain 
enhancer (Moss et al., 1996), zebrafish GATA-1 and GATA-3 promoters (Meng et al., 1997; 
Long et al., 1997), zebrafish a- and fi-actin promoters (Higashijima et al., 1997), and 
tilapia insulin-like growth factor I promoter (Chen et al., 1998). All of these transgenic 
15 experiments aim at either developing a GFP transgenic system for gene expression analysis 
or at testing regulatory DNA elements in gene promoters. 

SUMMARY OF THE INVENTION 

It is a primary objective of the invention to clone fish gene promoters that are 
constitutive (ubiquitous), or that have tissue specificity such as skin specificity or muscle 
20 specificity or that are inducible by a chemical substance, and to use these promoters to 
develop effective gene constructs for production of transgenic fish. 

It is another objective of the invention to develop fluorescent transgenic ornamental 
fish using these gene constructs. By applying different gene promoters, tissue-specific, 
inducible under different environmental conditions, or ubiquitous, to drive the GFP gene, 

25 GFP could be expressed in different tissues or ubiquitously. Thus, these transgenic fish 
may be skin fluorescent, muscle fluorescent, ubiquitously fluorescent, or inducibly 
fluorescent. These transgenic fish may be used for ornamental purposes, for monitoring 
environmental pollution, and for basic studies such as recapitulation of gene expression 
programs or monitoring cell lineage and cell migration. These transgenic fish may be used 

30 for cell transplantation and nuclear transplantation or fish cloning. 

Other objectives, features and advantages of the present invention will become 
apparent from the detailed description which follows, or may be learned by practice of the 
invention. 
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Four zebrafish gene promoters of different characteristics were isolated and four 
chimeric gene constructs containing a zebrafish gene promoter and EGFP DNA were 
made: pCK-EGFP, pMCK-EGFP, pMLC2f-EGFP and pARP-EGFP. The first chimeric 
gene-construct, pCK-EGFP, contains a 2.2 kbp polynucleotide comprising a zebrafish 

5 cytokeratin (CK) gene promoter which is specifically or predominantly expressed in skin 
epithelia. The second one, pMCK-EGFP, contains a 1.5 kbp polynucleotide comprising a 
muscle-specific promoter from a zebrafish muscle creatine kinase (MCK) gene and the 
gene is only expressed in the muscle tissue. The third construct, pMLC2f-EGFP contains a 
2.2 kpb polynucleotide comprising a strong skeletal muscle-specific promoter from the fast 

10 skeletal muscle isoform of the myosin light chain 2 (MLC2f) gene and is expressed 
specifically or predominantly in skeletal muscle. The fourth chimeric gene construct, 
pARP-EGFP, contains a strong and ubiquitously expressed promoter from a zebrafish 
acidic ribosomal protein (ARP) gene. These four chimeric gene constructs have been 
introduced into zebrafish at the one cell stage or two cell stage by microinjection. In all 

15 cases, the GFP expression patterns were consistent with the specificities of the promoters. 
GFP was predominantly expressed in skin epithelia with pCK-EGFP, specifically 
expressed in muscles with pMCK-EGFP, specifically expressed in skeletal muscles with 
pMLC2f-EGFP and ubiquitously expressed in all tissues with pARP-EGFP. 

These chimeric gene constructs are useful to generate green fluorescent transgenic 
20 fish. The GFP transgenic fish emit green fluorescence light under a blue or ultraviolet light 
and this feature makes the genetically engineered fish unique and attractive in the 
ornamental fish market. The fluorescent transgenic fish are also useful for the development 
of a biosensor system and as research models for embryonic studies such as cell lineage, 
cell migration, cell and nuclear transplantation etc. 

25 BRIEF DESCRIPTION OF THE DRAWINGS 

Figs. 1A-1I are photographs showing expression of CK (Figs. 1A-1C), MCK (Figs. 
ID-IE), ARP (Figs. 1F-1G) and MLC2f (Figs. 1H-1I) rnRNAs in zebrafish embryos as 
revealed by whole mount in situ hybridization (detailed description of the procedure can be 
found in Thisse et aL, 1993). (Fig. 1A) A 28 hpf (hour postfertilization) embryo hybridized 

30 with a CK antisense riboprobe. (Fig. IB) Enlargement of the mid-part of the embryo 
shown in Fig. 1A. (Fig. 1C) Cross-section of the embryo in Fig. 1A. (Fig. ID) A 30 hpf 
embryo hybridized with an MCK antisense riboprobe. (Fig. IE) Cross-section of the 
embryo in Fig ID. (Fig. IF) A 28 hpf embryo hybridized with an ARP antisense riboprobe. 
(Fig. 1G) Cross-section of the embryo in Fig. IF. Arrows indicate the planes for cross- 

35 sections and the box in panel A indicates the enlarged region shown in panel B. (Fig. 1H) 
Side view of a 22-hpf embryo hybridized with the MLC2f probe. (Fig. II) Transverse 
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section through the trunk of a stained 24-hpf embryo. SC, spinal cord; N, notochord. 

Fig. 2A is a digitized image showing distribution of CK, MCK and ARP mRNAs in 
adulttissues. Total RNAs were prepared from selected adult tissues as indicated at the top 
of each lane and analyzed by Northern blot hybridization (detailed description of the 
5 procedure can be found in Gong et al., 1992). Three identical blots were made from the 
same set of RNAs and hybridized with the CK, MCK and ARP probes, respectively. 

Fig. 2B is a digitized image showing distribution of MLC2f mRNA in adult tissues. 
Total RNAs were prepared from selected adult tissues as indicated at the top of each lane 
and analyzed by Northern blot hybridization (detailed description of the procedure can be 
10 found in Gong et al., 1992). Two identical blots were made from the same set of RNAs and 
hybridized with the MLC2f probe and a ubiquitously expressed U-actin probe, 
respectively. 

Fig. 3. is a schematic representation of the strategy of promoter cloning. Restriction 
enzyme digested genomic DNA was ligated with a short linker DNA which consists of 
15 Oligo 1 and Oligo 2. Nested PCR reactions were then performed: the first round PCR used 
linker specific primer LI and gene specific primers Gl, where Gl is CK1, MCK1, Ml or 
ARP1 in the described embodiments, and the second round linker specific primer L2 and 
gene specific primer G2, where G2 is CK2, MCK2, M2 or ARP2, respectively in the 
described embodiments. 

20 Fig. 4 is a schematic map of the chimeric gene construct, pCK-EGFP. The 2.2 kb 

zebrafish DNA fragment comprising the CK promoter region is inserted into pEGFP-1 
(Clonetech) at the EcoRI and BamHI site as indicated. In the resulting chimeric DNA 
construct, the EGFP gene is under control of the zebrafish CK promoter. Also shown is the 
kanamycin/neomycin resistance gene (KanWeoO in the backbone of the original pEGFP-1 

25 plasmid. The total length of the recombinant plasmid pCK-EGFP is 6.4 kb. 

Fig. 5 is a schematic map of the chimeric gene construct, pMCK-EGFP. The 1.5 kb 
zebrafish DNA fragment comprising the MCK promoter region is inserted into pEGFP-1 
(Clonetech) at the EcoRI and BamHI site as indicated. In the resulting chimeric DNA 
construct, the EGFP gene is under control of the zebrafish MCK promoter. Also shown is 
30 the kanamycin/neomycin resistance gene (KanVNeo 1 ") in the backbone of the original 
pEGFP-1 plasmid. The total length of the recombinant plasmid pMCK-EGFP is 5.7 kb. 

Fig. 6 is a schematic map of the chimeric gene construct, pARP-EGFP. The 2.2 kb 
zebrafish DNA fragment comprising the ARP promoter/1 st intron region is inserted into 
pEGFP-1 (Clonetech) at the EcoRI and BamHI site as indicated. In the resulting chimeric 
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DNA construct, the EGFP gene is under control of the zebrafish ARP promoter. Also 
shown is the kanamycin/neomycin resistance gene (KanVNeo 1 ) in the backbone of the 
original pEGFP-1 plasmid. The total length of the recombinant plasmid pARP-EGFP is 6.4 
kb. - 

5 Fig. 7 is a schematic map of the chimeric gene construct, pMLC2f-EGFP. The 2.0 

kb zebrafish DNA fragment comprising the MLC2f promoter region is inserted into 
pEGFP-1 (Clonetech) at the Hindlll and BamHI site as indicated. In the resulting chimeric 
DNA construct, the EGFP gene is under control of the zebrafish MLC2f promoter. Also 
shown is the kanamycin/neomycin resistance gene (KanWeo 1 ) in the backbone of the 
10 original pEGFP-1 plasmid. The total length of the recombinant plasmid pMLC2f-EGFP is 
6.2 kb. 

Fig. 8 is a photograph of a typical transgenic zebrafish fry (4 days old) with pCK- 
EGFP, which emits green fluorescence from skin epithelia under a blue light. 

Fig. 9 is a photograph of a typical transgenic zebrafish fry (3 days old) with pMCK- 
15 EGFP, which emits green fluorescence from skeletal muscles under a blue light. 

Fig. 10 is a photograph of a typical transgenic zebrafish fry (2 days old) with 
pARP-EGFP, which emits green fluorescence under a blue light from a variety of cell 
types such as skin epithelia, muscle cells, lens, neural tissues, notochord, circulating blood 
cells and yolk cells. 

Figs. 11A-11B. Photographs of a typical transgenic zebrafish founder with 
pMLC2f-EGFP (Fig. 1 1A) and an Fl stable transgenic offspring (Fig. 1 IB). Both pictures 
were taken under an ultraviolet light (365 rim). The green fluorescence can be better 
observed under a blue light with an optimal wavelength of 488 nm. 

Figs. 12A-12C. Examples of high, moderate and low expression of GFP in 
transiently transgenic embryos at 72 hpf. (Fig. 12 A) High expression, GFP expression was 
detected in essentially 100% of the muscle fibers in the trunk. (Fig. 12B) Moderate 
expression, GFP expression was detected in several bundles of muscle fibers, usually in the 
mid-trunk region. (Fig. 12C) Low expression, GFP expression occurred in dispersed 
muscle fibers and the number of GFP positive fibers is usually less than 20 per embryo. 

Fig. 13. Deletion analysis of the MLC2f promoter in transient transgenic zebrafish 
embryos. A series of 5' deletions of MLC2f-EGFP constructs containing —201 1-bp (2-kb), 
-1338-bp, -873-bp, -283-bp, -77-bp and -3 -bp of the MLC2f promoter were generated by 
unidirectional deletion using the double-stranded Nested Deletion Kit from Pharmacia 
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based on the manufacturer's instructional manual. Each construct was injected into 
approximately 100 embryos and GFP expression was monitored in the first 72 hours of 
embryonic development. The level of GFP expression was classified based on the 
examples shown in Figs. 12A-12C. Potential E-boxes and MEF2 binding sites, which are 
important for muscle-specific transcription (Schwarz et al., 1993; Olson et al., 1995), are 
indicated on the —201 1-bp construct. 

DETAILED DESCRIPTION OF THE INVENTION 
Gene Constructs 

To develop successful transgenic fish with a predictable pattern of transgene 
expression, the first step is to make a gene construct suitable for transgenic studies. The 

5 gene construct generally comprises three portions: a gene promoter, a structural gene and 
transcriptional termination signals. The gene promoter would determine where, when and 
under what conditions the structural gene is turned on. The structural gene contains protein 
coding portions that determine the protein to be synthesized and thus the biological 
function. The structural gene might also contain intron sequences which can affect mRNA 

10 stability or which might contain transcription regulatory elements. The transcription 
termination signals consist of two parts: a polyadenylation signal and a transcriptional 
termination signal after the polyadenylation signal. Both are important to terminate the 
transcription of the gene. Among the three portions, selection of a promoter is very 
important for successful transgenic study, and it is preferable to use a homologous 

15 promoter (homologous to the host fish) to ensure accurate gene activation in the transgenic 
host. 

A promoter drives expression "predominantly*' in a tissue if expression is at least 2- 
foid, preferably at least 5-fold higher in that tissue compared to a reference tissue. A 
promoter drives expression "specifically" in a tissue if the level of expression is at least 5- 
20 fold, preferably at least 10-fold higher, more preferably at least 50-fold higher in that tissue 
than in any other tissue. 

Recombinant DNA Constructs 

Recombinant DNA constructs comprising one or more of the DNA or RNA 
sequences described herein and an additional DNA and/or RNA sequence are also included 
25 within the scope of this invention. These recombinant DNA constructs usually have 
sequences which do not occur, in nature or exist in a form that does not occur in nature or 
exist in association with other materials that do not occur in nature. The DNA and/or RNA 
sequences described as constructs or in vectors above are "operably linked 11 with other 
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DNA and/or RNA sequences- DNA regions are operably linked when they are functionally 
related to each other. For example, DMA for a presequence or secretory leader is operably 
linked to DNA for a polypeptide if it is expressed as part of a preprotein which participates 
in the-secretion of the polypeptide; a promoter is operably linked to a coding sequence if it 
5 controls the transcription of the coding sequence; a ribosome binding site is operably 
linked to a coding sequence if it is positioned so as to permit translation. Generally, 
operably linked means contiguous (or in close proximity to) and, in the case of secretory 
leaders, contiguous and in reading phase. 

The sequences of some of the DNAs, and the corresponding proteins encoded by 
10 the DNA, which are useful in the invention are set forth in the attached Sequence Listing. 

The complete cytokeratin (CK) cDNA sequence is shown in SEQ ID NO:l, and its 
deduced amino acid sequence is shown in SEQ ID NO:2. The binding sites of the gene 
specific primers for promoter amplification, CK1 and CK2, are indicated. The extra 
nucleotides introduced into CK2 for generation of a restriction site are shown as a 
15 misc_feature in the primer sequence SEQ ID NO:lL A potential polyadenylation signal, 
AATAAA, is indicated in SEQ ID NO:l . 

The complete muscle creatine kinase (MCK) cDNA sequence is shown in SEQ ID 
NO:3, and its deduced amino acid sequence is shown in SEQ ID NO:4. The binding sites 
of the gene specific primers for promoter amplification, MCK1 and MCK2, are indicated. 
20 The extra nucleotides introduced into MCK1 and MCK2 for generation of restriction sites 
are shown as a misc_feature in the primer sequences SEQ ID NOS: 12 and 13, respectively. 
A potential polyadenylation signal, AATAAA, is indicated in SEQ ID NO:3. 

The complete fast skeletal muscle isoform of myosin light chain 2 (MLC2f) cDN A 
sequence is shown in SEQ ID NO:20, and its deduced amino acid sequence is shown in 
25 SEQ ID NO:21. The binding sites of the gene-specific primers for promoter amplification, 
Ml and M2, are indicated. Two potential polyadenylation signals, AATAAA, are shown as 
a misc Jeature in SEQ ID NO:20. 

The complete acidic ribosomal protein P0 (ARP) cDNA sequence is shown in SEQ 
ID NO:5 7 and its deduced amino acid sequence is shown in SEQ ID NO:6. The binding 
30 sites of the gene specific primers for promoter amplification, ARP1 and ARP2, are 
indicated. The extra nucleotides introduced into ARP2 for generation of a restriction site 
are shown as a misc_feature in the primer sequence SEQ ID NO: 15. A potential 
polyadenylation signal, AATAAA, is indicated in SEQ ID NO:5. 

SEQ ID NO:7 shows the complete sequence of the CK promoter region. A putative 
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TATA box is shown, and the 3' nucleotides identical to the 5' CK cDNA sequence are 
shown as a misc_feature. The binding site of the second gene specific primer, CK2, is 
shown. The introduced BamHI site is indicated as a miscjfeature in the primer sequence 
SEQ-IDNO:ll. 

5 SEQ ID NO:8 shows the complete sequence of the MCK promoter region. A 

putative TATA box is shown, and the 3' nucleotides identical to the 5' MCK cDNA 
sequence are shown as a miscjfeature in SEQ ID NO:8. The binding site of the second 
gene specific primer, MCK2, is shown. The introduced BamHI site is indicated as a 
misc_feature in the primer sequence SEQ ID NO: 13. 

10 SEQ ID NO:22 shows the complete sequence of the MLC2f promoter region. A 

putative TATA box is shown, and the 3' nucleotides identical to the 5' MLC2f cDNA 
sequence are shown as a miscjfeature. The binding site of the second gene-specific primer, 
M2, is shown. Potential muscle-specific cis-elements, E-boxes and MEF2 binding sites, are 
also shown. The proximal 1-kb region of the MLC2f promoter was recently published (Xu 

15 etaL, 1999). 

SEQ ID NO:9 shows the complete sequence of the ARP promoter region including 
the first intron. The first intron is shown, and the 3* nucleotides identical to the 5* ARP 
cDNA sequence are shown as misc_features. No typical TATA box is found. The binding 
site of the second gene specific primer, ARP2, is shown. The introduced BamHI site is 
20 indicated as a misc feature in the primer sequence SEQ ID NO: 1 5. 

Specifically Exemplified Polypeptides/DNA 

The present invention contemplates use of DNA that codes for various polypeptides 
and other types of DNA to prepare the gene constructs of the present invention. DNA that 
codes for structural proteins, such as fluorescent peptides including GFP, EGFP, BFP, 

25 EBFP, YFP, EYFP, CFP, ECFP and enzymes (such as luciferase, B-galactosidase, 
chloramphenicol acetyltransferase, etc.), and hormones (such as growth hormone etc.), are 
useful in the present invention. More particularly, the DNA may code for polypeptides 
comprising the sequences exemplified in SEQ ID NOS:2, 4, 6 and 21. The present 
invention also contemplates use of particular DNA sequences, including regulatory 

30 sequences, such as promoter sequences shown in SEQ ID NOS; 7, 8, 9 and 22 or portions 
thereof effective as promoters. Finally, the present invention also contemplates the use of 
additional DNA sequences, described generally herein or described in the references cited 
herein, for various purposes. 
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Chimeric Genes 

The present invention also encompasses chimeric genes comprising a promoter 
described herein operatively linked to a heterologous gene. Thus, a chimeric gene can 
comprise a promoter of a zebrafish operatively linked to a zebrafish structural gene other 
than that normally found linked to the promoter in the genome. Alternatively, the 
promoter can be operatively linked to a gene that is exogenous to a zebrafish, as 
exemplified by the GFP and other genes specifically exemplified herein. Furthermore, a 
chimeric gene can comprise an exogenous promoter linked to any structural gene not 
normally linked to that promoter in the genome of an organism. 

Variants of Specifically Exemplified Polypeptide 

DNA that codes for variants of the specifically exemplified polypeptides are also 
encompassed by the present invention. Possible variants include allelic variants and 
corresponding polypeptides from other organisms, particularly other organisms of the same 
species, genus or family. The variants may have substantially the same characteristics as 
the natural polypeptides. The variant polypeptide will possess the primary property of 
concern for the polypeptide. For example, the polypeptide will possess one or more or all 
of the primary physical (e.g., solubility) and/or biological (e.g., enzymatic activity, 
physiologic activity or fluorescence excitation or emission spectrum) properties of the 
reference polypeptide. DNA of the structural genes of the present invention will encode a 
protein that produces a fluorescent or chemiluminescent light under conditions appropriate 
to the particular polypeptide in one or more tissues of a fish. Preferred tissues for 
expression are skin, muscle, eye and bone. 

Substitutions, Additions and Deletions 

As possible variants of the above specifically exemplified polypeptides, the 
polypeptide may have additional individual amino acids or amino acid sequences inserted 
into the polypeptide in the middle thereof and/or at the N-terminal and/or C-tenriinal ends 
thereof so long as the polypeptide possesses the desired physical and/or biological 
characteristics. Likewise, some of the amino acids or amino acid sequences may be deleted 
from the polypeptide so long as the polypeptide possesses the desired physical and/or 
biochemical characteristics. Amino acid substitutions may also be made in the sequences 
so long as the polypeptide possesses the desired physical and biochemical characteristics. 
DNA coding for these variants can be used to prepare gene constructs of the present 
invention. 
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Sequence Identity 

The variants of polypeptides or polynucleotides contemplated herein should 
possess more than 75% sequence identity (sometimes referred to as homology), preferably 
more than 85% identity, most preferably more than 95% identity, even more preferably 
5 more than 98% identity to the naturally occurring and/or specifically exemplified 
sequences or fragments thereof described herein. To determine this homology, two 
sequences are aligned so as to obtain a maximum match using gaps and inserts. 

Two sequences are said to be "identical" if the sequence of residues is the same 
when aligned for maximum correspondence as described below. The term 
10 "complementary" applies to nucleic acid sequences and is used herein to mean that the 
sequence is complementary to all or a portion of a reference polynucleotide sequence. 

Optimal alignment of sequences for comparison can be conducted by the local 
homology algorithm of Smith and Waterman (1981), by the homology alignment method 
of Needleman and Wunsch (1970), by the search for similarity method of Pearson and 
15 Lippman (1988), or the like. Computer implementations of the above algorithms are 
known as part of the Genetics Computer Group (GCG) Wisconsin Genetics Software 
Package (GAP, BESTFIT, BLASTA, FASTA and TFASTA), 575 Science Drive, Madison, 
WL These programs are preferably run using default values for all parameters. 

"Percentage of sequence identity" is determined by comparing two optimally 
20 aligned sequences over a comparison window, wherein the portion of the sequence in the 
comparison window may comprise additions or deletions (i.e. "gaps") as compared to the 
reference sequence for optimal alignment of the two sequences being compared. The 
percentage identity is calculated by determining the number of positions at which the 
identical residue occurs in both sequences to yield the number of matched positions, 
25 dividing the number of matched positions by the total number of positions in the window 
and multiplying the result by 100 to yield the percentage of sequence identity. Total 
identity is then determined as the average identity over all of the windows that cover the 
complete query sequence. 

Fragments of Polypeptide 

30 Genes which code for fragments of the full length polypeptides such as proteolytic 

cleavage fragments which contain at least one, and preferably all, of the above-listed 
physical and/or biological properties are also encompassed by the present invention. 




WO 00/49150 PCT/SG99/00079 

- 12- 

DNA and RNA 

The invention encompasses DNA that codes for any one of the above-described 
polypeptides including, but not limited to, those shown in SEQ ID NOS:2, 4, 6 and 21 
including fusion polypeptides, variants and fragments thereof. The sequence of certain 
5 particularly useful cDNAs which encode polypeptides are shown in SEQ ID NOS:l, 3, 5 
and 20. The present invention also includes cDNA as well as genomic DNA containing or 
comprising the requisite nucleotide sequences as well as corresponding RNA and antisense 
sequences. 

Cloned DNA within the scope of the invention also includes allelic variants of the 
10 specific sequences presented in the attached Sequence Listing. An "allelic variant" is a 
sequence that is a variant from that of the exemplified nucleotide sequence, but represents 
the same chromosomal locus in the organism. In addition to those which occur by normal 
genetic variation in a population and perhaps fixed in the population by standard breeding 
methods, allelic variants can be produced by genetic engineering methods. A preferred 
15 allelic variant is one that is found in a naturally occurring organism, including a laboratory 
strain. Allelic variants are either silent or expressed. A silent allele is one that does not 
affect the phenotype of the organism. An expressed allele results in a detectable change in 
the phenotype of the trait represented by the locus. 

A nucleic acid sequence "encodes" or "codes for" a polypeptide if it directs the 
20 expression of the polypeptide referred to. The nucleic acid can be DNA or RNA. Unless 
otherwise specified, a nucleic acid sequence that encodes a polypeptide includes the 
transcribed strand, the hnRNA and the spliced RNA or the DNA representative of the 
mRNA. An "antisense" nucleic acid is one that is complementary to all or part of a strand 
representative of mRNA, including untranslated portions thereof. 

25 Degenerate Sequences 

In accordance with degeneracy of genetic code, it is possible to substitute at least 
one base of the base sequence of a gene by another kind of base without causing the amino 
acid sequence of the polypeptide produced from the gene to be changed. Hence, the DNA 
of the present invention may also have any base sequence that has been changed by 
30 substitution in accordance with degeneracy of genetic code. 

DNA Modification 

The DNA is readily modified by substitution, deletion or insertion of nucleotides, 
thereby resulting in novel DNA sequences encoding the polypeptide or its derivatives. 
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These modified sequences are used to produce mutant polypeptide and to directly express 
the polypeptide. Methods for saturating a particular DNA sequence with random mutations 
and also for making specific site-directed mutations are known in the art; see e.g. 
Sambrook et al. (1989). 

5 Hybridizable Variants 

The DNA molecules useful in accordance with the present invention can comprise a 
nucleotide sequence selected from the group consisting of SEQ ID NOS.:l, 3, 5, 7-20 and 
22-24 or can comprise a nucleotide sequence that hybridizes to a DNA molecule 
comprising the nucleotide sequence of SEQ ID NOS.:l, 3, 5 or 20 under salt and 

10 temperature conditions providing stringency at least as high as that equivalent to 5x SSC 
and 42°C and that codes on expression for a polypeptide that has one or more or all of the 
above-described physical and/or biological properties. The present invention also includes 
polypeptides coded for by these hybridizable variants. The relationship of stringency to 
hybridization and wash conditions and other considerations of hybridization can be found 

15 in Chapters 1 1 and 12 of Sambrook et al (1989). The present invention also encompasses 
functional promoters which hybridize to SEQ ID NOS:7, 8, 9 or 22 under the above- 
described conditions. DNA molecules of the invention will preferably hybridize to 
reference sequences under more stringent conditions allowing the degree of mismatch 
represented by the degrees of sequence identity enumerated above. The present invention 

20 also encompasses functional primers or linker oligonucleotides set forth in SEQ ID 
NOS:10-19 and 23-24 or larger primers comprising these sequences, or sequences which 
hybridize with these sequences under the above-described conditions. The primers usually 
have a Length of 10-50 nucleotides, preferably 15-35 nucleotides, more preferably 18-30 
nucleotides. 

25 Vectors 

The invention is further directed to a replicable vector containing cDNA that codes 
for the polypeptide and that is capable of expressing the polypeptide. 

The present invention is also directed to a vector comprising a replicable vector and 
a DNA sequence corresponding to the above described gene inserted into said vector. The 
30 vector may be an integrating or non-integrating vector depending on its intended use and is 
conveniently a plasmid. 

Transformed Cells 

The invention further relates to a transformed cell or microorganism containing 
cDNA or a vector which codes for the polypeptide or a fragment or variant thereof and that 
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is capable of expressing the polypeptide. 

Expression Systems Using Vertebrate Cells 

~~ Interest has been great in vertebrate cells, and propagation of vertebrate cells in 
culture (tissue culture) has become a routine procedure. Examples of vertebrate host cell 
5 lines useful in the present invention preferably include cells from any of the fish described 
herein. Expression vectors for such cells ordinarily include (if necessary) an origin of 
replication, a promoter located upstream from the gene to be expressed, along with a 
ribosome-binding site, RNA splice site (if intron-containing genomic DNA is used or if an 
intron is necessary to optimize expression of a cDNA), a polyadenylation site, and a 
10 transcription termination sequence. 

EXAMPLES 

The following examples are provided by way of illustration only and not by way of 
limitation. Those of skill will readily recognize a variety of noncritical parameters which 
can be changed or modified to yield essentially similar results. 

15 Example I: Isolation of skin-specific, muscle-specific and ubiquitously expressed 
zebrafish cDNA clones. 

cDNA clones were isolated and sequenced as described by Gong et al. (1997). 
Basically, random cDNA clones were selected from zebrafish embryonic and adult cDNA 
libraries and each clone was partially sequenced by a single sequencing reaction. The 

20 partial sequences were then used to identify the sequenced clones for potential function and 
tissue specificity. Of the distinct clones identified by this approach, four of them were 
selected: for skin specificity (clone A39 encoding cytokeratin, CK), for muscle specificity 
(clone El 46 encoding muscle creatine kinase, MCK), for skeletal muscle specificity (clone 
A113 encoding the fast skeletal muscle isoform of the myosin light chain 2, MLC2f) and 

25 for ubiquitous expression (clone A 150 encoding acidic ribosomal protein P0, ARP), 
respectively. 

The four cDNA clones were sequenced, and their complete cDNA sequences with 
deduced amino acid sequences are shown in SEQ ID NOS:l, 3, 5, and 20 respectively. A39 
encodes a type II basic cytokeratin and its closest homolog in mammals is cytokeratin 8 
30 (65-68% amino acid identity). El 46 codes for the zebrafish MCK and its amino acid 
sequence shares -87% identity with mammalian MCKs. A113 encodes the fast skeletal 
muscle isoform of the myosin light chain 2. The deduced amino acid sequence of this gene 
is highly homologous to other vertebrate fast skeletal muscle MLC2f proteins (over 80% 
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amino acid identity). The amino acid sequence of zebrafish ARP deduced from the A 150 
clone is 87-89% identical to those of mammalian ARPs. 

To demonstrate their expression patterns, whole mount in situ hybridization (Thisse 
et aL, 1993) was performed for developing embryos and Northern blot analyses (Gong et 
5 al., 1992) were carried out for selected adult tissues and for developing embryos. 

As indicated by whole mount in situ hybridization, cytokeratin mRNA was 
specifically expressed in the embryonic surface (Figs. 1A-1C ) and cross section of in situ 
hybridized embryos confirmed that the expression was only in skin epithelia (Fig. 1C). 
Ontogenetically, the cytokeratin mRNA appeared before 4 hours post-fertilization (hpf) 

1 0 and it is likely that the transcription of the cytokeratin gene starts at mid-blastula transition 
when the zygotic genome is activated. By in situ hybridization, a clear cytokeratin mRNA 
signal was detected in highly flattened cells of the superficial layer in blastula and the 
expression remained in the superficial layer which eventually developed into skin epithelia 
including the yolk sac. In adult tissues, cytokeratin mRNA was predominantly detected in 

15 the skin and also weakly in several other tissues including the eye, gill, intestine and 
muscle, but not in the liver and ovary (Fig. 2). Therefore, the cytokeratin mRNA is 
predominantly, if not specifically, expressed in skin cells. 

MCK mRNA was first detected in the first few anterior somites in 10 somite stage 
embryos (14 hpf) and at later stages the expression is specifically in skeletal muscle (Fig. 
20 ID) and in heart (data not shown). When the stained embryos are cross-sectioned, the 
MCK mRNA signal was found exclusively in the trunk skeletal muscles (Fig. IE). In adult 
tissues, MCK mRNA was detected exclusively in the skeletal muscle (Fig. 2). 

MLC2f mRNA was specifically expressed in fast skeletal muscle in developing 
zebrafish embryos (Figs. 1H- II). To examine the tissue distribution of MLC2f mRNA, total 
25 RNAs were prepared from several adult tissues including heart, brain, eyes, gills, intestine, 
liver, skeletal muscle, ovary, skin, and testis. MLC2f mRNA was only detected in the 
skeletal muscle by Northern analysis; while a-actin mRNA was detected ubiquitously in 
the same set of RNAs, confirming the validity of the assay (Fig. 2B). 

ARP mRNA was expressed ubiquitously and it is presumably a maternal mRNA 
30 since it is present in the ovary as well as in embryos at one cell stage. In in situ 
hybridization experiments, an intense hybridization signal was detected in most tissues. An 
example of a hybridized embryo at 28 hpf is shown in Fig. IF. In adults, ARP mRNA was 
abundantly expressed in all tissues examined except for the brain where a relatively weak 
signal was detected (Fig. 2A). These observations confirmed that the ARP mRNA is 
3 5 expressed ubiquitously . 
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Isolation of zebra fish gene promoters 
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Four zebrafish gene promoters were isolated by a linker-mediated PCR method as 
described by Liao et aL, (1997) and as exemplified by the diagrams in Fig. 3. The whole 
procedure includes the following steps: 1) designing of gene specific primers; 2) isolation 
5 of zebrafish genomic DNA; 3) digestion of genomic DNA by a restriction enzyme; 4) 
ligation of a short linker DNA to the digested genomic DNA; 5) PCR amplification of the 
promoter region; and 6) DNA sequencing to confirm the cloned DNA fragment. The 
following is the detailed description of these steps. 

1. Designing of gene specific primers 

10 Gene specific PCR primers were designed based on the 5' end of the four cDNA 

sequences and the regions used for designing the primers are shown in SEQ ID NOS: 1, 3, 
5 and 20. 

The two cytokeratin gene specific primers are: 
CK1 (SEQIDNO:10) 

15 CK2 (SEQ ID NO:l 1), where the first six nucleotides are for creation of an EcoRI site to 
facilitate cloning. 

The two muscle creatine kinase gene specific primers are: 

MCK1 (SEQ ID NO: 12), where the first five nucleotides are for creation of an EcoRI site 
to facilitate cloning. 

20 MCK2 (SEQ ID NO: 13), where the first three nucleotides are for creation of an EcoRI site 
to facilitate cloning. 

The two fast skeletal muscle isoform of myosin light chain 2 gene specific primers are: 
Ml (SEQ IDNO:23) 
M2 (SEQ IDNO:24) 

25 The two acidic ribosomal protein P0 gene specific primers are: 



ARP1 (SEQ ID NO: 14) 
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ARP2 (SEQ ID NO: 15), where the first six nucleotides are for creation of an EcoRI site to 
facilitate cloning. 

2. "~ Isolation of zebrafish genomic DNA 

Genomic DNA was isoLated from a single individual fish by a standard method (Sambrook 
5 et a/., 1989). Generally, an adult fish was quickly frozen in liquid nitrogen and ground into 
powder. The ground tissue was then transferred to an extraction buffer (10 mM Tris, pH 8, 
0.1 M EDTA, 20 p,g/ml RNase A and 0.5% SDS) and incubated at 37°C for 1 hour. 
Proteinase K was added to a final concentration of 100 jag/ml and gently mixed until the 
mixture appeared viscous, followed by incubation at 50°C for 3 hours with periodical 
10 swirling. The genomic DNA was gently extracted three times by phenol equilibrated with 
Tris-HCl (pH 8), precipitated by adding 0.1 volume of 3 M NaOAc and 2.5 volumes of 
ethanol, and collected by swirling on a glass rod, then rinsed in 70% ethanol. 

3. Digestion of genomic DNA by a restriction enzyme 

Genomic DNA was digested with the selected restriction enzymes. Generally, 500 
15 units of restriction enzyme were used to digest 50 |ag of genomic DNA overnight at the 
optimal enzyme reaction temperature (usually at 37°C). 

4. Ligation of a short linker DNA to the digested genomic DNA 

The linker DNA was assembled by annealing equal moles of the two linker 
oligonucleotides, Oligol (SEQ ID NO: 16) and Oligo 2 (SEQ ID NO: 17). Oligo 2 was 
20 phosphorylated by T4 polynucleotide kinase prior to annealing. Restriction enzyme 
digested genomic DNA was filled-in or trimmed with T4 DNA polymerase, if necessary, 
and ligated with the linker DNA. Ligation was performed with 1 \xg of digested genomic 
DNA and 0.5 \ig of linker DNA in a 20 jj.1 reaction containing 10 units of T4 DNA ligase 

at 4°C overnight. 

25 5. PGR amplification of promoter region 

PCR was performed with Advantage Tth Polymerase Mix (Clontech). The first 
round of PCR was performed using a linker specific primer LI (SEQ ID NO: 18) and a 
gene specific primer Gl (CK1, MCK1, Ml or ARP1). Each reaction (50 ul) contains 5 [x\ 
of lOx Tth PCR reaction buffer (1X= 15 mM KOAc, 40 mM Tris, pH 9.3), 2.2 u,l of 25 
30 mM Mg(OAc)2, 5 ul of 2 mM dNTP, 1 \i\ of LI (0.2 (ag/u.1), 1 u.1 of Gl (0.2 ^ig/u.1) , 33.8 
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\x\ of H2O, and 1 |il (50 ng) of linker ligated genomic DNA and 1 ul of 50x Tth 
polymerase mix (Clontech). The cycling conditions were as follows: 94°C/1 min, 35 
cycles of 94°C/30 sec and 68°C/6 min, and finally 68°C/8 min. After the primary round of 
PCR was completed, the products were diluted 100 fold. One ul of diluted PCR product 
5 was used as template for the second round of PCR (nested PCR) with a second linker 
specific primer L2 (SEQ ID NO: 19) and a second gene specific primer G2 (CK2, MCK2, 
M2 or ARP2), as described for the primary PCR but with the following modification: 
94°C/1 min, 25 cycles of 94°C/30 sec and 68°C/6 min, and finally 68°C/8 min. Both the 
primary and secondary PCR products were analyzed on a 1% agarose gel. 

10 6. DNA sequencing to confirm the cloned DNA fragment 

PCR products were purified from the agarose gel following electrophoresis and 
cloned into a TA vector, pT7Blue™ (Novogen). DNA sequencing was performed by 
dideoxynucleotide chain termination method using a T7 Sequencing Kit purchased from 
Pharmacia. Complete sequences of these promoter regions were obtained by automatic 
15 sequencing using a dRhodamine Terminator Cycle Sequencing Ready Reaction Kit 
(Perkin-Elmer) and an ABI 377 automatic sequencing machine. 

The isolated cytokeratin DNA fragment comprising the gene promoter is 2.2 kb. In 
the 3' proximal region immediately upstream of a portion identical to the 3' part of the CK 
cDNA sequence, there is a putative TATA box perfectly matching to a consensus TATA 

20 box sequence. The 164 bp of the 3' region is identical to the 5' UTR (untranslated region) 
of the cytokeratin cDNA. Thus, the isolated fragment was indeed derived from the same 
gene as the cytokeratin cDNA clone (SEQ ID NO:7). Similarly, a 1.5 kb 5' flanking region 
was isolated from the muscle creatine kinase gene, a putative TATA box was also found in 
its 3' proximal region and the 3* region is identical to the 5' portion of the MCK cDNA 

25 clone (SEQ ID NO:8). For MLC2f, a 2 kb region was isolated from the fast skeletal muscle 
isoform of myosin light chain 2 gene and sequenced completely. The promoter sequence 
for MLC2f is shown in SEQ ID NO:22. The sequence immediately upstream of the gene 
specific primer M2 is identical to the 5' UTR of the MLC2f cDNA clone; thus, the 
amplified DNA fragments are indeed derived from the MLC2f gene. A perfect TATA box 

30 was found 30 nucleotides upstream of the transcription start site, which was defined by a 
primer extension experiment based on Sambrook et al. (1989). In the 2-kb region 
comprising the promoter, six E-boxes (CANNTG) and six potential MEF2 binding sites 
[C/T)TA(T/A)4TA(A/G)] were found and are indicated in SEQ ID NO:22. Both of these 
cis-element classes are important for muscle specific gene transcription (Schwarz et al., 

35 1993; Olson et al., 1995). A 2.2 kb fragment was amplified for the ARP gene. By 
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alignment of its sequence with the ARP cDN A clone, a 1.3 kb intron was found in the 5 1 
UTR (SEQ ID NO:9). As a result, the isolated ARP promoter is within a DNA fragment 
about 0.8 kb long. 

Example III: Generation of green fluorescent transgenic fish 

5 The isolated zebrafish gene promoters were inserted into the plasmid pEGFP-1 

(Clonetech), which contains an EGFP structural gene whose codons have been optimized 
according to preferable human codons. Three promoter fragments were inserted into 
pEGFP-1 at the EcoRI and BamHI site and the resulting recombinant plasmids were named 
pCK-EGFP (Fig. 4), pMCK-EGFP (Fig. 5), and pARP-EGFP, respectively (Fig. 6). The 
10 promoter fragment for the MLC2f gene was inserted into the Hind III and Bam HI sites of 
the plasmid pEGFP-1 and the resulting chimeric DNA construct, pMLC2f-EGFP, is 
diagramed in Fig. 7. 

Linearized plasmid DNAs at a concentrations of 500 jig/ml (for pCK-EGFP and 
pMCK-EGFP) and 100 ng/ml (for pMLC2f-EGFP) in 0.1 M Tris-HCl (pH 7.6)/0.25% 

15 phenol red were injected into the cytoplasm of 1- or 2-cell stage embryos. Because of a 
high mortality rate, pARP-EGFP was injected at a lower concentration (50 fig/ml). Each 
embryo received 300-500 pi of DNA. The injected embryos were reared in autoclaved 
Holtfreter's solution (0.35% NaCl, 0.01% KC1 and 0.01% CaCl2) supplemented with 1 
jig/ml of methylene blue. Expression of GFP was observed and photographed under a 

20 ZEISS Axiovert 25 fluorescence microscope. 

When zebrafish embryos received pCK-EGFP, GFP expression started about 4 
hours after injection, which corresponds to the stage of -30% epiboly. About 55% of the 
injected embryos expressed GFP at this stage. The early expression was always in the 
superficial layer of cells, mimicking endogenous expression of the CK gene as observed by 
25 in situ hybridization. At later stages, in all GFP-expressing fish, GFP was found 
predominantly in skin epithelia. A typical pCK-EGFP transgenic zebrafish fry at 4 days old 
is shown in Fig. 8. 

Under the MCK promoter, no GFP expression was observed in early embryos 
before muscle cells become differentiated. By 24 hpf, about 12% of surviving embryos 
30 expressed GFP strongly in muscle cells and these GFP-positive embryos remain GFP- 
positive after hatching. The GFP expression was always found in many bundles of muscle 
fibers, mainly in the mid-trunk region and no expression was ever found in other types of 
cells. A typical pMCK-EGFP transgenic zebrafish fry (3 days old) is shown in Fig. 9. 



Expression of pARP-EGFP was first observed 4 hours after injection at the 30% 
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epiboly stage. The timing of expression is similar to that of pCK-EGFP-injected embryos. 
However, unlike the pCK-EGFP transgenic embryos, the GFP expression under the ARP 
promoter occurred not only in the superficial layer of cells but also in deep layers of cells. 
In some batches of injected embryos, almost 100% of the injected embryos expressed 
5 initially. At later stages when some embryonic cells become overtly differentiated, it was 
found that the GFP expression occurred essentially in all different types of cells such as 
skin epithelia, muscle cells, lens, neural tissues, notochord, circulating blood cells and yolk 
cells (Fig. 10). 

Under the MLC2f promoter, nearly 60% of the embryos expressed GFP, The 
10 earliest GFP expression started in trunk skeletal muscles about 19 hours after injection, 
which corresponds to the stage of 20-somite. Later, the GFP expression also occurred in 
head skeletal muscles including eye muscles, jaw muscles, gill muscles etc. 

Transgenic founder zebrafish containing pMLC2f-EGFP emit a strong green 
fluorescent light under a blue or ultraviolet light (Fig. 1 1 A). When the transgenic founders 
15 were crossed with wild-type fish, transgenic offspring were obtained that also displayed 
strong green fluorescence (Fig. 11B). The level of GFP expression is so high in the 
transgenic founders and offspring that green fluorescence can be observed when the fish 
are exposed to sunlight. 

To identify the DNA elements conferring the strong promoter activity in skeletal 
20 muscles, deletion analysis of the 2-kb DNA fragment comprising the promoter was 
performed. Several deletion constructs, which contain 5' deletions of the MLC2f promoter 
upstream of the EGFP gene, were injected into the zebrafish embryos and the transient 
expression of GFP in early embryos (19-72 hpf) was compared. To facilitate the 
quantitative analysis of GFP expression, we define the level of expression as follows (Figs. 
25 12A-12C): 

Strong expression: GFP expression was detected in essentially 100% muscle fibers 
in the trunk. 

Moderate expression: GFP expression was detected in several bundles of muscle 
fibers, usually in the mid-trunk region. 

30 Weak expression: GFP expression occurred in dispersed muscle fibers and the 

number of GFP positive fibers is usually less than 20 per embryo. 

As summarized in Fig. 13, deletion up to -283 bp maintained the GFP expression 
in skeletal muscles in 100% of the expressing embryos; however, the level of GFP 
expression from these deletion constructs varies greatly. Strong expression drops from 
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23% to 0% from the 2-kb (-201 1 bp) promoter to the -283-bp promoter. Thus, only two 
constructs (-201 1 bp and -1338 bp) are capable of maintaining the high level of expression 
and the highest expression was obtained only with the 2-kb promoter, indicating the 
importance of the promoter region of -1338 bp to -2011 bp for conferring the highest 
5 promoter activity . 

The expression of GFP using pMLC2f-EGFP is much higher than that obtained 
using the pMCK-EGFP that contains a 1.5 kb of zebrafish MCK promoter. By the same 
assay in transient transgenic zebrafish embryos, only about 12% of the embryos injected 
with pMCK-EGFP expressed GFP. Among the expressing embryos, no strong expression 
10 was observed, arid 70% and 30% showed moderate and weak expression, respectively. In 
comparison, about 60% of the embryos injected with pMLC2f-EGFP expressed GFP and 
23%, 37% and 40% showed strong, moderate and weak expression, respectively. 

Example IV: Potential applications of fluorescent transgenic fish 

The fluorescent transgenic fish have use as ornamental fish in the market. Stably 
15 transgenic lines can be developed by breeding a GFP transgenic individual with a wild type 
fish or another transgenic fish. By isolation of more zebrafish gene promoters, such as eye- 
specific, bone-specific, tail-specific etc., and/or by classical breeding of these transgenic 
zebrafish, more varieties of fluorescent transgenic zebrafish can be produced. Previously, 
we have reported isolation of over 200 distinct zebrafish cDNA clones homologous to 
20 known genes (Gong et al., 1997). These isolated clones code for proteins in a variety of 
tissues and some of them are inducible by heat-shock, heavy metals, or hormones such as 
estrogens. By using the method of PCR amplification using gene-specific primers 
designed from the nucleotide sequences of these cDNAs, and the linker-specific primers 
described herein, the promoters of the genes represented by the cDNAs of Gong et al. can 
25 be used in the present invention. Thus, other tissue-specific promoters, hormone-inducible 
promoters, heavy-metal inducible promoters and the like from zebrafish can be isolated 
and used to make fluorescent zebrafish (or other fish species) that express a GFP or variant 
thereof, in response to the relevant compound. 

Multiple color fluorescent fish may be generated by the same technique as blue 
30 fluorescent protein (BFP) gene, yellow fluorescent protein (YFP) gene and cyan 
fluorescent protein (CFP) gene are available from Clonetech. For example, a transgenic 
fish with GFP under an eye-specific promoter, BFP under a skin-specific promoter, and 
YFP under a muscle-specific promoter will show the following multiple fluorescent colors: 
green eyes, blue skin and yellow muscle. By recombining different tissue specific 
35 promoters and fluorescent protein genes, more varieties of transgenic fish of different 
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fluorescent color patterns will be created. By expression of two or more different 
fluorescent proteins in the same tissue, an intermediate color may be created. For example, 
expression of both GFP and BFP under a skin-specific promoter, a dark-green skin color 
may be created. 

5 By using a heavy metal- (such as cadmium, cobalt, chromium) inducible or 

hormone- (such as estrogen, androgen or other steroid hormone) inducible promoter, a 
biosensor system may be developed for monitoring environmental pollution and for 
evaluating water quality for human consumption and aquacultural uses. In such a biosensor 
system, the transgenic fish will glow with a green fluorescence (or other color depending 

10 on the fluorescence protein gene used) when pollutants such as heavy metals and estrogens 
(or their derivatives) reach a threshold concentration in an aquatic environment. Such a 
biosensor system has advantages over classical analytical methods because it is rapid, 
visualizable, and capable of identifying specific compounds directly in complex mixture 
found in an aquatic environment, and is portable or less instrument dependent. Moreover, 

15 the biosensor system also provides direct information on biotoxicity and it is biodegradable 
and regenerative. 

Environmental monitoring of several substances can be accomplished by either 
creating one transgenic fish having genes encoding different colored fluorescent proteins 
driven by promoters responsive to each substance. Then the particular colors exhibited the 
20 fish in an environment can be observed. Alternatively, a number of fish can be 
transformed with individual vectors, then the fish can be combined into a population for 
monitoring an environment and the colors expressed by each fish observed. 

' In addition, the fluorescent transgenic fish should also be valuable in the market for 

scientific research tools because they can be used for embryonic studies such as tracing cell 
25 lineage and cell migration. Ceils from transgenic fish expressing GFP can also be used as 
cellular and genetic markers in cell transplantation and nuclear transplantation 
experiments. 

The chimeric gene constructs demonstrated successfully in zebrafish in the present 
invention should also be applicable to other fish species such as medaka, goldfish, carp 

30 including koi, loach, tilapia, glassfish, catfish, angel fish, discus, eel, tetra, goby, gourami, 
guppy, Xiphophorus (swordtail), hatchet fish, Molly fish, pangasius, etc. The promoters 
described herein can be used directly in these fish species. Alternatively, the homologous 
gene promoters from other fish species can be isolated by the method described in this 
invention. For example, the isolated and characterized zebrafish cDNA clones and 

35 promoters described in this invention can be used as molecular probes to screen for 
homologous promoters in other fish species by molecular hybridization or by PCR. 
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Alternatively, one can first isolate the zebrafish cDNA and promoters based on the 
sequences presented in SEQ ID NOS:l, 3, 5, 7, 8, 9, 20 and 22 or using data from other 
sequences of cDNAs disclosed by Gong et al. 1997, by PCR and then use the zebrafish 
gene "fragments to obtain homologous genes from other fish species by the methods 
mentioned above. 

In addition, a strong muscle-specific promoter such as MLC2f is valuable to direct 
a gene to be expressed in muscle tissues for generation of other beneficial transgenic fish. 
For example, transgenic expression of a growth hormone gene under the muscle-specific 
promoter may stimulate somatic growth of transgenic fish. Such DNA can be introduced 
either by microinjection, electroporation, or sperm carrier to generate germ-line transgenic 
fish, or by direct injection of naked DNA into skeletal muscles (Xu et al., 1999) or into 
other tissues or cavities, or by a biolistic method (gene bombardment or gene gun) 
(Gomez-Chiarri et al., 1996). 
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